Parsing the SynTagRus Treebank of Russian

نویسندگان

  • Joakim Nivre
  • Igor Boguslavsky
  • Leonid L. Iomdin
چکیده

We present the first results on parsing the SYNTAGRUS treebank of Russian with a data-driven dependency parser, achieving a labeled attachment score of over 82% and an unlabeled attachment score of 89%. A feature analysis shows that high parsing accuracy is crucially dependent on the use of both lexical and morphological features. We conjecture that the latter result can be generalized to richly inflected languages in general, provided that sufficient amounts of training data are available.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building a Dependency Parsing Model for Russian with MaltParser and MyStem Tagset

The paper describes a series of experiments on building a dependency parsing model using MaltParser, the SynTagRus treebank of Russian, and the morphological tagger Mystem. The experiments have two purposes. The first one is to train a model with a reasonable balance of quality and parsing time. The second one is to produce user-friendly software which would be practical for obtaining quick res...

متن کامل

M a T E M a T I C K O -f Y Z I K Á L N Í F a K U L T a Conversion of Syntagrus (the Russian Dependency Treebank) to Universal Dependencies

This report presents the Universal Dependency (UD) annotated corpus for Russian and a conversion process which was developed to transform SynTagRus, the Russian dependency treebank, into a UD-style annotated corpus. The aim of this work was to create a UD-style annotated corpus for Russian since no such corpus was available prior to UD release 1.3. The conversion rules were based on manually an...

متن کامل

Converting SynTagRus Dependency Treebank into Penn Treebank Style

This paper presents the conversion of SynTagRus dependency structures into Penn Treebank style phrase structures, whose resulting data will be used to train a statistical constituency parser for Russian and create a large-scale constituency-parsed corpus. The implemented conversion includes various innovative features in order to create phrase structure trees that are closest to Penn Treebank s...

متن کامل

تصحیح خودکار خطا در درخت بانک نحوی با استفاده از یادگیری ماشینی انتقال محور

The Treebank is one of the most useful resources for supervised or semi-supervised learning in many NLP tasks such as speech recognition, spoken language systems, parsing and machine translation. Treebank can be developded in different ways that could be, generally, categorized in manually and statistical approaches. While the resulted Treebank in each of these methods has the annotation error,...

متن کامل

Feature Engineering in Persian Dependency Parser

Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008